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1 Performance and dependability evaluation of scalable nnassively parallel 97% 
connputer systems with conjoint simulation 

Axel Hein , Mario Dal Cin 

ACM Transactions on Modeling and Computer Simulation (TOMACS) October 1998 
Volume 8 Issue 4 

Computer systems are becoming more and more a part of our daily life; business and 
industry rely on their service, and the health of human beings depends on their correct 
functioning. Computer systems used for critical tasl<s have to be carefully designed 
and tested during the early design stage, the prototype phase, and their operational 
life. Methods and tools are required to support and facilitate this vital task. In this 
article, we tackle the issue of system-level performance and depen ... 



2 Fast cluster failover using virtual memory-mapped communication 9i% 

Yuanyuan Zhou , Peter M. Chen , Kai Li 
— Proceedings of the 13th international conference on Supercomputing May 1999 



3 Economical Fault-Tolerant Networks 87% 

Ali Raza Butt , Jahangir Hasan , Kamran Khalid , Farhan-ud-din Mizra 
^ Linux Journal June 2000 

A software solution to achieve fault tolerancecapitalizing on redundant replication of 
data and elimination of any single point of failure and with transparent switchover. 



4 High Availability Cluster Checklist 85% 

Tim Burke 

Linux Journal November 2000 

With a variety of clustering services on the market, the ability to determine how well 
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options meet your specific business needs is necessary. 



5 Recovery in the Calypso file system 83% 

Murtliy Devarakonda , Bill Kish , Ajay Moiiindra 
^ ACM Transactions on Computer Systems (TOCS) August 1996 
Volume 14 Issue 3 

This article presents tiie deign and implementation of the recovery scheme in Calypso. 
Calypso Is a cluster-optimized, distributed file system for UNIX clusters. As in Sprite 
and AFS, Calypso servers are stateful and scale well to a large number of clients. The 
recovery scheme in Calypso is nondisruptive, meaning that open files remain open, 
client modified data are saved, and in-flight operations are properly handled across 
server recover. The scheme uses distributed state amount the client ... 



6 An Ethernet compatible low cost/high performance communication 82% 
U solution 

I. Chlamtac , A. Herman 

ACM SIGCOMM Computer Communication Review , Proceedings of the ACM 
workshop on Frontiers in computer communications technology August 1987 
Volume 17 Issue 5 

The LAN-HUB is a new local area network designed to combine the properties of 
several existing LAN standards to provide highly reliable communication at a relatively 
lower cost per station, improve network capacity/delay performance and increase the 
LAN user's flexibility in configuring his network. The LAN-HUB network is configured 
around the CODEX 4320 LAN-HUB communication controllers which allow up to eight 
Ethernet/IEEE 802.3 stations to transparently share one network transceiver or R ... 



7 A High Availability Clustering Solution 82% 

Phil Lewis 
— Linux Journal August 1999 

Mr. Lewis tells us how he designed and implemented a simple high-availability solution 
for his company 



8 On calculating connected dominating set for efficient routing in ad hoc 80% 
12 wireless networks 

Jie Wu , Hailan Li 

Proceedings of the 3rd international worl<shop on Discrete algorithms and 
methods for mobile computing and communications August 1999 



9 On estimating end-to-end network path properties 80% 

Mark Allman , Vern Paxson 

ACM SIGCOMM Computer Communication Review , Proceedings of the conference 
on Applications, technologies, architectures, and protocols for computer 
communication August 1999 
Volume 29 Issue 4 

The more information about current network conditions available to a transport 
protocol, the more efficiently it can use the network to transfer its data. In networks 
such as the Internet, the transport protocol must often form its own estimates of 
network properties based on measurements performed by the connection endpoints. 
We consider two basic transport estimation problems: determining the setting of the 
retransmission timer (RTO) for a reliable protocol, and estimating the bandwidth 
availa ... 
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10 WELD— an environment for Web-based electronic design 80% 

Francis L Chan , Mark D. Spiller , A. Ricliard Newton 

— Proceedings of the 35th annual conference on Design aut mation c nference May 

1998 

Increasing size and geographical separation of design data and teams has created a 
need for a network-based electronic design environment that is scaleable, adaptable, 
secure, highly available, and cost effective. In the WELD project we are evaluating 
aspects of the network integration and communication infrastructure needed to enable 
such a distributed design environment. The architecture of WELD and the components 
developed to implement the system, together with performance result ... 

11 Floor control for large-scale MBone seminars 80% 

Radhlka l^alpani , Lawrence A. Rowe 

— Proceedings of the fifth ACM international conference on Multimedia November 
1997 

12 Availabilty: World wide failures 77% 
Werner Vogels 

Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems 
support for worldwide applications September 1996 

The one issue that unites almost all approaches to distributed computing is the need to 
know whether certain components in the system have failed or are otherwise 
unavailable. When designing and building systems that need to function at a global 
scale, failure management needs to be considered a fundamental building block. This 
paper describes the development of a system-independent failure management 
service, which allows systems and applications to incorporate accurate detection of 
failed proc ... 

13 Low-latency communication on the IBM RISC system/6000 SP 77% 
Chi-Chao Chang , Grzegorz Czajkowski , Chris Hawblitzel , Thorsten von Eicken 

— Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM) 

November 1996 

The IBM SP is one of the most powerful commercial MPPs, yet, in spite of its fast 
processors and high network bandwidth, the SP's communication latency is inferior to 
older machines such as the TMC CM-5 or Meiko CS-2. This paper investigates the use 
of Active Messages (AM) communication primitives as an alternative to the standard 
message passing in order to reduce communication overheads and to offer a good 
building block for higher layers of software. The first part of this paper describe ... 

14 Active middleware services in a decision support system for managing 77% 
1^ highly available distributed resources 

Sameh A. Fakhouri , William F. Jerome , Vijay K. Naik , Ajay Raina , Pradeep Varma 
IFIP/ACM International Conference on Distributed systems platforms April 2000 

We describe a decision support system called Mounties that is designed for managing 
applications and resources using rule-based constraints in scalable mission-critical 
clustering environments. Mounties consists of four active service components: (1) a 
repository of resource proxy objects for modeling and manipulating the cluster 
configuration; (2) an event notification mechanism for monitoring and controlling 
interdependent and distributed resources; (3) a rule evaluation and decision proces ... 

15 A network performance tool for grid environments 77% 
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Craig A. Lee , James Stepanek , Rich Wolski , Carl Kesselman , Ian Foster 
^ Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) 

January 1999 



16 An architecture for a secure service discovery service 77% 
Steven E. Czerwinski , Ben Y. Zhao , Todd D. Modes , Anthony D. Joseph , Randy H. Katz 

— Proceedings of the 5th annual ACM/IEEE international conference on Mobile 
computing and networking August 1999 



17 Flooding for reliable multicast in nnulti-hop ad hoc networks 77% 

Christopher Ho , Katia Obraczka , Gene Tsudik , Kumar Viswanath 
Proceedings of the 3rd international workshop on Discrete algorithms and 
methods for mobile computing and communications August 1999 



18 Internet routing instability 77% 

Craig Labovitz , G. Robert Malan , Farnam Jahanian 
' IEEE/ACM Transactions on Networking (TON) October 1998 
Volume 6 Issue 5 



19 Resource aggregation for fault tolerance in integrated services networks 77% 

Constantinos Dovrolis , Parameswaran Ramanathan 
ACM SIGCOMM Computer Communication Review April 1998 
Volume 28 Issue 2 

For several real-time applications it is critical that the failure of a network component 
does not lead to unexpected termination or long disruption of service. In this paper, 
we propose a scheme called RAFT (Resource Aggregation for Fault Tolerance) that 
guarantees recovery in a timely and resource-efficient manner. RAFT is presented in 
the framework of the Reliable Back-bone (RBone), a virtual network layered on top of 
an integrated services network. Applications can request fault tolerance ag ... 



20 Frangipani: a scalable distributed file systenn 77% 

Chandramohan A. Thekkath , Timothy Mann , Edward K. Lee 
— ACi^ SIGOPS Operating Systems Review , Proceedings of the sixteenth ACM 
symposium on Operating systems principles October 1997 
Volume 31 Issue 5 
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